{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Lab 1: Introduction to Python and Pandas\n", "\n", "### Packages\n", "\n", "A package in Python is analogous to an app for a smartphone, in that the package adds new functionality to Python, just like an app adds new functionality to a smartphone. Also, just like an app, we have to install each package the first time we use it, but after that, we just tell Python that we want to use it (equivalent to opening the app). \n", "\n", "Since packages take some time to download and install, we will start by installing the two packages we are using today: Matplotlib and Pandas.\n", "\n", "Run the following code. There are three ways to run code, and you can use whichever you prefer:\n", "1. Clicking on the arrow to the very left of the cell (this arrow will only appear when your mouse is near it).\n", "2. Click on the cell, and then click the Run button in the menu at the top. \n", "3. Click on the cell, and then press control + enter simultaneously. (If you have just been typing in the cell, you don't need to click on it first.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --user matplotlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you see a number appear in the [ ] to the left of the cell above, that cell has finished running and you can start running this next cell. These cells might take 5-10 minutes to finish running." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --user pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Math in Python\n", "\n", "Computers were originally invented to do complicated math computations, and we can also do computations in Python.\n", "\n", "Try running the code below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "3 + 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Is the answer what you expect? \n", "\n", "What do you think the * symbol does? Try running the code below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "2*5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\\* is used for multiplication. The addition (+), subtraction (-), and division (/) symbols are the usual ones. \n", "\n", "What is 216 divided by 8? Write the code to compute this below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Variables\n", "\n", "A *variable* is a label or nickname that can be used to access something (a number, text, a link to a file, etc.) in the computer's memory. A variable in programming work similarly to variable in math, but it can store or represent more than just numbers.\n", "\n", "To store the value 16 in the variable `x`, type `x = 16` below and run the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see what value is stored in the variable `x`, type `x` below and run the cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use the variable `x`, for example, in a math equation. Run the code below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x/4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Is the answer what you expected? \n", "\n", "Notice that the value of `x` does not change:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to save the result of `x/4`, we need to store it in a new variable, like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "answer = x/4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like nothing happened, but try displaying the value in `answer` below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Functions\n", "\n", "What does the following code do?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x\n", "answer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It only displays the value in the variable `answer`, because Jupyter notebook only displays the value of the last line of the cell. To display the values of both `x` and `answer`, we need to use the function `print()`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(x)\n", "print(answer)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A *function* is a command that tells the computer to perform a specific action. For example, the `print()` function tells the computer to display whatever in inside the parentheses on the screen. The first function we'll learn is 'print()', which tells the computer to display a word or words on the screen.\n", "\n", "All functions (commands) are followed by parentheses, and additional information for the computer (called *parameters*) is put inside the parentheses. In the example above, `x` and `answer` are the parameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try printing a variable that we haven't assigned (given) a value. Type `print(y)` below and run the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We get an error! The last line `NameError: name 'y' is not defined` is the most important, and tells us that we have not defined a variable `y`.\n", "\n", "Can you define `y` to be 100 below?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should have typed `y=100` and ran the code. We can check defined the variable by either typing `print(y)` or simply `y`. Try both below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting: line plots\n", "\n", "Now we are going to make a *line plot* of the historical population of New York city and its five boroughs.\n", "\n", "Just like you need to open an app each time you want to use it, we need to import a package each time we want to use it. \n", "\n", "To import the Matplotlib package, type `import matplotlib.pyplot as plt` below and run the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like nothing happened, but we can now use functions from the Matplotlib package.\n", "\n", "To import the Pandas package, type `import pandas as pd` below and run the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like nothing happened, but we can now use Pandas functions. \n", "\n", "Next we want to upload our data file to Jupyter Hub. Download the file `nycHistPop.csv` from the course webpage, noting which directory it is saved in. Then click on the web brower tab \"Home\", which shows your files in Jupyter Hub. Upload `nycHistPop.csv` by clicking the Upload button on the top left. Once the file is uploaded, return to this page.\n", "\n", "Open the file `nycHistPop.csv` using TextEditor. What do you notice about the file?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A *comma separated value (CSV)* file store data from a table as a text file. Each row in the file corresponds to a row in the table, and the column values are separated by commas, which explains the name. \n", "\n", "We will next tell Python to open the file `nycHistPop.csv` and store it as a *dataframe* in the variable `pop`. A *dataframe* is a similar to a table in Excel. Each column corresponds to some statistical variable (ex. year, Manhattan population, Bronx population) and each row corresponds to an observation or piece of data (ex. the populations in each borough in a specific year).\n", "\n", "To create the dataframe, type:\n", "`pop = pd.read_csv(\"nycHistPop.csv\",skiprows = 5)` \n", "below and run the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happened?\n", "\n", "Even though it looks like nothing happened, we have defined a variable `pop` that should contain the contents of `nycHistPop.csv`. To see this, type `pop` below and run the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happened? \n", "\n", "The variable `pop` now contains all the rows of `nycHistPop.csv` (except the first 5, which we skipped with the parameter `skiprows=5`). Also, all the data is arranged in a table, called a *dataframe*.\n", "\n", "How many people lived in the Bronx in 1800?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To plot the data, type `pop.plot(x = “Year”)` and run the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code did something, but we don't see a graph! To actually see the graph, we need to tell Jupyter to display the graphs \"inline\" by typing `%matplotlib inline` below and running the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now try typing `pop.plot(x=\"Year\")` again and running the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which borough currently has the largest population? Which borough had the largest population in 1900?\n", "\n", "#### Challenges\n", "* What happens if you leave off the `x = \"Year\"` ? Why?\n", "* What happens if you use the parameters `x = \"Year\", y = \"Bronx\"` ?\n", "* What percentage of the total New York population in 1898 was in Manhattan? Do any calculations in the Jupyter notebook.\n", "\n", "You can create as many code cells as you need by clicking on *Insert* in the menu, and then selecting *Insert Cell Below*. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.8" } }, "nbformat": 4, "nbformat_minor": 2 }